Improving target language modeling techniques for statistical machine translation
نویسنده
چکیده
The aim of this study is to find ways of improving target language modeling (TLM) applied to statistical machine translation (SMT). We describe current research activities dedicated to TLM improvement that are applied to the 2007 n-gram-based statistical machine translation system developed in the TALP Research Center at the Technical University of Catalonia (UPC). We consider two new language modeling improvement techniques: threshold-based TLM pruning and TLM based on statistical classes. Some of the research is still in progress. In this paper we describe some of the major problems faced and outline possible solutions and plans for future research. We describe the results for the SpanishEnglish and English-Spanish language pairs from the official TC-STAR 1 2006 evaluation.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملImproving Translation to Morphologically Rich Languages (Améliorer la traduction des langages morphologiquement riches) [in French]
Améliorer la traduction des langages morphologiquement riches While statistical techniques for machine translation have made significant progress in the last 20 years, results for translating to morphologically rich languages are still mixed versus previous generation rule-based systems. Current research in statistical techniques for translating to morphologically rich languages varies greatly ...
متن کاملUsing Related Languages to Enhance Statistical Language Models
The success of many language modeling methods and applications relies heavily on the amount of data available. This problem is further exacerbated in statistical machine translation, where parallel data in the source and target languages is required. However, large amounts of data are only available for a small number of languages; as a result, many language modeling techniques are inadequate f...
متن کاملTopic and Sentiment in Phrase-based Statistical Machine Translation
In this paper, we model two textual properties, topic and sentiment, at the sentence and document levels, with the goal of improving the performance of machine translation by taking into account this information in source and target sentences. In the topical similarity approach, we augment the source sentence with the keywords extracted from its adjacent sentences and re-rank the candidate targ...
متن کاملIntegration of ASR and machine translation models in a document translation task
This paper is concerned with the problem of machine aided human language translation. It addresses a translation scenario where a human translator dictates the spoken language translation of a source language text into an automatic speech dictation system. The source language text in this scenario is also presented to a statistical machine translation system (SMT). The techniques presented in t...
متن کامل